Mining Interesting Infrequent and Frequent Itemsets Based on Minimum Correlation Strength

نویسنده

  • Xiangjun Dong
چکیده

IMLMS (interesting MLMS (Multiple Level Minimum Supports)) model, which was proposed in our previous works, is designed for pruning uninteresting infrequent and frequent itemsets discovered by MLMS model. One of the pruning measures used in IMLMS model, interest, can be described as follows: to two disjoint itemsets A,B, if interest(A,B)=|s(A∪B) s(A)s(B)|<mi, then A∪B is recognized as uninteresting itemset and is pruned, where s(⋅) is the support and mi a minimum interestingness threshold. This measure, however, is a bit difficult for users to set the value mi because interest (A,B) highly depends on the values of s(⋅). So in this paper, we propose a new measure, MCS (minimum correlation strength) as a substitute. MCS, which is based on correlation coefficient, has better performance than interest and it is very easy for users to set its value. The theoretical analysis and experimental results show the validity of the new measure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient Algorithm to Automated Discovery of Interesting Positive and Negative Association Rules

Association Rule mining is very efficient technique for finding strong relation between correlated data. The correlation of data gives meaning full extraction process. For the discovering frequent items and the mining of positive rules, a variety of algorithms are used such as Apriori algorithm and tree based algorithm. But these algorithms do not consider negation occurrence of the attribute i...

متن کامل

A Survey of Frequent and Infrequent Weighted Itemset Mining Approaches

Itemset mining is a data mining method extensively used for learning important correlations among data. Initially itemsets mining was made on discovering frequent itemsets. Frequent weighted item set characterizes data in which items may weight differently through frequent correlations in data’s. But, in some situations, for instance certain cost functions need to be minimized for determining r...

متن کامل

Minimally Infrequent Itemset Mining using Pattern-Growth Paradigm and Residual Trees

Itemset mining has been an active area of research due to its successful application in various data mining scenarios including finding association rules. Though most of the past work has been on finding frequent itemsets, infrequent itemset mining has demonstrated its utility in web mining, bioinformatics and other fields. In this paper, we propose a new algorithm based on the pattern-growth p...

متن کامل

Negative and Positive Association Rules Mining from Text Using Frequent and Infrequent Itemsets

Association rule mining research typically focuses on positive association rules (PARs), generated from frequently occurring itemsets. However, in recent years, there has been a significant research focused on finding interesting infrequent itemsets leading to the discovery of negative association rules (NARs). The discovery of infrequent itemsets is far more difficult than their counterparts, ...

متن کامل

Mining Frequent Patterns via Pattern Decomposition

• Candidates Generation and Test (Agrawal &Srikant, 1994; Heikki, Toivonen &Verkamo, 1994; Zaki et al., 1997): Starting at k=0, it first generates candidate k+1 itemsets from known frequent k itemsets and then counts the supports of the candidates to determine frequent k+1 itemsets that meet a minimum support requirement. • Sampling Technique (Toivonen, 1996): Uses a sampling method to select a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011